Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 37
Filtrar
1.
J Chem Inf Model ; 61(10): 4913-4923, 2021 10 25.
Artigo em Inglês | MEDLINE | ID: mdl-34554736

RESUMO

Modern QSAR approaches have wide practical applications in drug discovery for designing potentially bioactive molecules. If such models are based on the use of 2D descriptors, important information contained in the spatial structures of molecules is lost. The major problem in constructing models using 3D descriptors is the choice of a putative bioactive conformation, which affects the predictive performance. The multi-instance (MI) learning approach considering multiple conformations in model training could be a reasonable solution to the above problem. In this study, we implemented several multi-instance algorithms, both conventional and based on deep learning, and investigated their performance. We compared the performance of MI-QSAR models with those based on the classical single-instance QSAR (SI-QSAR) approach in which each molecule is encoded by either 2D descriptors computed for the corresponding molecular graph or 3D descriptors issued for a single lowest energy conformation. The calculations were carried out on 175 data sets extracted from the ChEMBL23 database. It is demonstrated that (i) MI-QSAR outperforms SI-QSAR in numerous cases and (ii) MI algorithms can automatically identify plausible bioactive conformations.


Assuntos
Algoritmos , Relação Quantitativa Estrutura-Atividade , Bases de Dados Factuais , Descoberta de Drogas , Conformação Molecular
2.
Sci Rep ; 11(1): 3178, 2021 02 04.
Artigo em Inglês | MEDLINE | ID: mdl-33542271

RESUMO

The "creativity" of Artificial Intelligence (AI) in terms of generating de novo molecular structures opened a novel paradigm in compound design, weaknesses (stability & feasibility issues of such structures) notwithstanding. Here we show that "creative" AI may be as successfully taught to enumerate novel chemical reactions that are stoichiometrically coherent. Furthermore, when coupled to reaction space cartography, de novo reaction design may be focused on the desired reaction class. A sequence-to-sequence autoencoder with bidirectional Long Short-Term Memory layers was trained on on-purpose developed "SMILES/CGR" strings, encoding reactions of the USPTO database. The autoencoder latent space was visualized on a generative topographic map. Novel latent space points were sampled around a map area populated by Suzuki reactions and decoded to corresponding reactions. These can be critically analyzed by the expert, cleaned of irrelevant functional groups and eventually experimentally attempted, herewith enlarging the synthetic purpose of popular synthetic pathways.

4.
Int J Mol Sci ; 21(15)2020 Aug 03.
Artigo em Inglês | MEDLINE | ID: mdl-32756326

RESUMO

Nowadays, the problem of the model's applicability domain (AD) definition is an active research topic in chemoinformatics. Although many various AD definitions for the models predicting properties of molecules (Quantitative Structure-Activity/Property Relationship (QSAR/QSPR) models) were described in the literature, no one for chemical reactions (Quantitative Reaction-Property Relationships (QRPR)) has been reported to date. The point is that a chemical reaction is a much more complex object than an individual molecule, and its yield, thermodynamic and kinetic characteristics depend not only on the structures of reactants and products but also on experimental conditions. The QRPR models' performance largely depends on the way that chemical transformation is encoded. In this study, various AD definition methods extensively used in QSAR/QSPR studies of individual molecules, as well as several novel approaches suggested in this work for reactions, were benchmarked on several reaction datasets. The ability to exclude wrong reaction types, increase coverage, improve the model performance and detect Y-outliers were tested. As a result, several "best" AD definitions for the QRPR models predicting reaction characteristics have been revealed and tested on a previously published external dataset with a clear AD definition problem.


Assuntos
Quimioinformática/tendências , Domínios Proteicos , Relação Quantitativa Estrutura-Atividade , Termodinâmica , Fenômenos Químicos , Cinética , Modelos Moleculares
5.
Chem Soc Rev ; 49(11): 3525-3564, 2020 06 07.
Artigo em Inglês | MEDLINE | ID: mdl-32356548

RESUMO

Prediction of chemical bioactivity and physical properties has been one of the most important applications of statistical and more recently, machine learning and artificial intelligence methods in chemical sciences. This field of research, broadly known as quantitative structure-activity relationships (QSAR) modeling, has developed many important algorithms and has found a broad range of applications in physical organic and medicinal chemistry in the past 55+ years. This Perspective summarizes recent technological advances in QSAR modeling but it also highlights the applicability of algorithms, modeling methods, and validation practices developed in QSAR to a wide range of research areas outside of traditional QSAR boundaries including synthesis planning, nanotechnology, materials science, biomaterials, and clinical informatics. As modern research methods generate rapidly increasing amounts of data, the knowledge of robust data-driven modelling methods professed within the QSAR field can become essential for scientists working both within and outside of chemical research. We hope that this contribution highlighting the generalizable components of QSAR modeling will serve to address this challenge.


Assuntos
Química Farmacêutica/métodos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/metabolismo , Preparações Farmacêuticas/química , Algoritmos , Animais , Inteligência Artificial , Bases de Dados Factuais , Desenho de Fármacos , História do Século XX , História do Século XXI , Humanos , Modelos Moleculares , Relação Quantitativa Estrutura-Atividade , Teoria Quântica , Reprodutibilidade dos Testes
7.
Mol Inform ; 39(12): e2000009, 2020 12.
Artigo em Inglês | MEDLINE | ID: mdl-32347666

RESUMO

Generative Topographic Mapping (GTM) can be efficiently used to visualize, analyze and model large chemical data. The GTM manifold needs to span the chemical space deemed relevant for a given problem. Therefore, the Frame set (FS) of compounds used for the manifold construction must well cover a given chemical space. Intuitively, the FS size must raise with the size and diversity of the target library. At the same time, the GTM training can be very slow or even becomes technically impossible at FS sizes of the order of 105 compounds - which is a very small number compared to today's commercially accessible compounds, and, especially, to the theoretically feasible molecules. In order to solve this problem, we propose a Parallel GTM algorithm based on the merging of "intermediate" manifolds constructed in parallel for different subsets of molecules. An ensemble of these subsets forms a FS for the "final" manifold. In order to assess the efficiency of the new algorithm, 80 GTMs were built on the FSs of different sizes ranging from 10 to 1.8 M compounds selected from the ChEMBL database. Each GTM was challenged to build classification models for up to 712 biological activities (depending on the FS size). With the novel parallel GTM procedure, we could thus cover the entire spectrum of possible FS sizes, whereas previous studies were forced to rely on the working hypothesis that FS sizes of few thousands of compounds are sufficient to describe the ChEMBL chemical space. In fact, this study formally proves this to be true: a FS containing only 5000 randomly picked compounds is sufficient to represent the entire ChEMBL collection (1.8 M molecules), in the sense that a further increase of FS compound numbers has no benefice impact on the predictive propensity of the above-mentioned 712 activity classification models. Parallel GTM may, however, be required to generate maps based on very large FS, that might improve chemical space cartography of big commercial and virtual libraries, approaching billions of compounds.


Assuntos
Algoritmos , Big Data , Benchmarking , Bases de Dados de Compostos Químicos , Entropia
8.
Expert Opin Drug Discov ; 15(7): 755-764, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32228116

RESUMO

INTRODUCTION: Deep discriminative and generative neural-network models are becoming an integral part of the modern approach to ligand-based novel drug discovery. The variety of different architectures of neural networks, the methods of their training, and the procedures of generating new molecules require expert knowledge to choose the most suitable approach. AREAS COVERED: Three different approaches to deep learning use in ligand-based drug discovery are considered: virtual screening, neural generative models, and mutation-based structure generation. Several architectures of neural networks for building either discriminative or generative models are considered in this paper, including deep multilayer neural networks, different kinds of convolutional neural networks, recurrent neural networks, and several types of autoencoders. Several kinds of learning frameworks are also considered, including adversarial learning and reinforcement learning. Different types of representations for generating molecules, including SMILES, graphs, and several alternative string representations are also considered. EXPERT OPINION: Two kinds of problem should be solved in order to make the models built using deep neural networks, especially generative models, a valuable option in ligand-based drug discovery: the issue of interpretability and explainability of deep-learning models and the issue of synthetic accessibility of novel compounds designed by deep-learning algorithms.


Assuntos
Aprendizado Profundo , Descoberta de Drogas/métodos , Redes Neurais de Computação , Algoritmos , Desenho de Fármacos , Humanos , Ligantes
9.
Mol Inform ; 39(6): e1900170, 2020 06.
Artigo em Inglês | MEDLINE | ID: mdl-32090493

RESUMO

Generative Topographic Mapping (GTM) is a dimensionality reduction method, which is widely used for both data visualization and structure-activity modeling. Large dimensionality of the initial data space may require significant computational resources and slow down the GTM construction. Therefore, it may be meaningful to reduce the number of descriptors used for encoding molecular structures. The Principal Component Analysis (PCA), a standard preprocessing tool, suffers from the information loss upon the dimensionality reduction. As an alternative, we propose to use substructure vector embedding provided by the mol2vec technique. In addition to the data dimensionality reduction, this technology also accounts for proximity of substructures in molecular graphs. In this study, dimensionality of large descriptor spaces of ISIDA fragment descriptors or Morgan fingerprints were reduced using either the PCA or the mol2vec method. The latter significantly speeds up GTM training without compromising its predictive power in bioactivity classification tasks.


Assuntos
Algoritmos , Análise de Dados , Visualização de Dados , Análise de Componente Principal
10.
Future Med Chem ; 11(20): 2701-2713, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-31596146

RESUMO

The analysis of information on the spatial structure of molecules and the physical fields of their interactions with biological targets is extremely important for solving various problems in drug discovery. This mini-review article surveys the main features of the continuous molecular fields approach and its use for analyzing structure-activity relationships in 3D space, building 3D quantitative structure-activity models and conducting similarity based virtual screening. Particular attention is paid to the consideration of the concept of molecular co-fields and their use for the interpretation of 3D structure-activity models. The principles of molecular design based on the overlapping and the similarity of molecular fields with corresponding co-fields are formulated.


Assuntos
Estrutura Molecular , Ligação de Hidrogênio , Interações Hidrofóbicas e Hidrofílicas , Modelos Moleculares , Relação Estrutura-Atividade
11.
J Chem Inf Model ; 59(11): 4569-4576, 2019 11 25.
Artigo em Inglês | MEDLINE | ID: mdl-31638794

RESUMO

Here, we describe a concept of conjugated models for several properties (activities) linked by a strict mathematical relationship. This relationship can be directly integrated analytically into the ridge regression (RR) algorithm or accounted for in a special case of "twin" neural networks (NN). Developed approaches were applied to the modeling of the logarithm of the prototropic tautomeric constant (logKT) which can be expressed as the difference between the acidity constants (pKa) of two related tautomers. Both conjugated and individual RR and NN models for logKT and pKa were developed. The modeling set included 639 tautomeric constants and 2371 acidity constants of organic molecules in various solvents. A descriptor vector for each reaction resulted from the concatenation of structural descriptors and some parameters for reaction conditions. For the former, atom-centered substructural fragments describing acid sites in tautomer molecules were used. The latter were automatically identified using the condensed graph of reaction approach. Conjugated models performed similarly to the best individual models for logKT and pKa. At the same time, the physically grounded relationship between logKT and pKa was respected only for conjugated but not individual models.


Assuntos
Compostos Orgânicos/química , Preparações Farmacêuticas/química , Ácidos/química , Algoritmos , Descoberta de Drogas , Modelos Químicos , Estrutura Molecular , Redes Neurais de Computação , Relação Quantitativa Estrutura-Atividade , Solventes/química , Estereoisomerismo
13.
J Chem Inf Model ; 59(3): 1182-1196, 2019 03 25.
Artigo em Inglês | MEDLINE | ID: mdl-30785751

RESUMO

Here we show that Generative Topographic Mapping (GTM) can be used to explore the latent space of the SMILES-based autoencoders and generate focused molecular libraries of interest. We have built a sequence-to-sequence neural network with Bidirectional Long Short-Term Memory layers and trained it on the SMILES strings from ChEMBL23. Very high reconstruction rates of the test set molecules were achieved (>98%), which are comparable to the ones reported in related publications. Using GTM, we have visualized the autoencoder latent space on the two-dimensional topographic map. Targeted map zones can be used for generating novel molecular structures by sampling associated latent space points and decoding them to SMILES. The sampling method based on a genetic algorithm was introduced to optimize compound properties "on the fly". The generated focused molecular libraries were shown to contain original and a priori feasible compounds which, pending actual synthesis and testing, showed encouraging behavior in independent structure-based affinity estimation procedures (pharmacophore matching, docking).


Assuntos
Aprendizado Profundo , Desenho de Fármacos , Domínio Catalítico , Avaliação Pré-Clínica de Medicamentos , Ligantes , Simulação de Acoplamento Molecular , Receptor A2A de Adenosina/química , Receptor A2A de Adenosina/metabolismo , Bibliotecas de Moléculas Pequenas/metabolismo , Bibliotecas de Moléculas Pequenas/farmacologia
14.
Mol Inform ; 37(9-10): e1800056, 2018 09.
Artigo em Inglês | MEDLINE | ID: mdl-30039933

RESUMO

Generative Topographic Mapping (GTM) approach was successfully used to visualize, analyze and model the equilibrium constants (KT ) of tautomeric transformations as a function of both structure and experimental conditions. The modeling set contained 695 entries corresponding to 350 unique transformations of 10 tautomeric types, for which KT values were measured in different solvents and at different temperatures. Two types of GTM-based classification models were trained: first, a "structural" approach focused on separating tautomeric classes, irrespective of reaction conditions, then a "general" approach accounting for both structure and conditions. In both cases, the cross-validated Balanced Accuracy was close to 1 and the clusters, assembling equilibria of particular classes, were well separated in 2-dimentional GTM latent space. Data points corresponding to similar transformations measured under different experimental conditions, are well separated on the maps. Additionally, GTM-driven regression models were found to have their predictive performance dependent on different scenarios of the selection of local fragment descriptors involving special marked atoms (proton donors or acceptors). The application of local descriptors significantly improves the model performance in 5-fold cross-validation: RMSE=0.63 and 0.82 logKT units with and without local descriptors, respectively. This trend was as well observed for SVR calculations, performed for the comparison purposes.


Assuntos
Algoritmos , Simulação de Dinâmica Molecular , Compostos Orgânicos/química , Isomerismo , Solventes/química
15.
Methods Mol Biol ; 1800: 119-139, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29934890

RESUMO

Various methods of machine learning, supervised and unsupervised, linear and nonlinear, classification and regression, in combination with various types of molecular descriptors, both "handcrafted" and "data-driven," are considered in the context of their use in computational toxicology. The use of multiple linear regression, variants of naïve Bayes classifier, k-nearest neighbors, support vector machine, decision trees, ensemble learning, random forest, several types of neural networks, and deep learning is the focus of attention of this review. The role of fragment descriptors, graph mining, and graph kernels is highlighted. The application of unsupervised methods, such as Kohonen's self-organizing maps and related approaches, which allow for combining predictions with data analysis and visualization, is also considered. The necessity of applying a wide range of machine learning methods in computational toxicology is underlined.


Assuntos
Simulação por Computador , Aprendizado de Máquina , Toxicologia/métodos , Algoritmos , Aprendizado Profundo , Modelos Lineares , Redes Neurais de Computação , Relação Quantitativa Estrutura-Atividade , Máquina de Vetores de Suporte
16.
J Comput Aided Mol Des ; 31(8): 701-714, 2017 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-28688089

RESUMO

Generative topographic mapping (GTM) approach is used to visualize the chemical space of organic molecules (L) with respect to binding a wide range of 41 different metal cations (M) and also to build predictive models for stability constants (logK) of 1:1 (M:L) complexes using "density maps," "activity landscapes," and "selectivity landscapes" techniques. A two-dimensional map describing the entire set of 2962 metal binders reveals the selectivity and promiscuity zones with respect to individual metals or groups of metals with similar chemical properties (lanthanides, transition metals, etc). The GTM-based global (for entire set) and local (for selected subsets) models demonstrate a good predictive performance in the cross-validation procedure. It is also shown that the data likelihood could be used as a definition of the applicability domain of GTM-based models. Thus, the GTM approach represents an efficient tool for the predictive cartography of metal binders, which can both visualize their chemical space and predict the affinity profile of metals for new ligands.


Assuntos
Quelantes/química , Complexos de Coordenação/química , Metais/química , Algoritmos , Simulação por Computador , Ligantes , Funções Verossimilhança , Estrutura Molecular , Relação Estrutura-Atividade , Termodinâmica
17.
Mol Inform ; 36(11)2017 11.
Artigo em Inglês | MEDLINE | ID: mdl-28627811

RESUMO

In Energy-Based Neural Networks (EBNNs), relationships between variables are captured by means of a scalar function conventionally called "energy". In this article, we introduce a procedure of "harmony search", which looks for compounds providing the lowest energies for the EBNNs trained on active compounds. It can be considered as a special kind of similarity search that takes into account regularities in the structures of active compounds. In this paper, we show that harmony search can be used for performing virtual screening. The performance of the harmony search based on two types of EBNNs, the Hopfield Networks (HNs) and the Restricted Boltzmann Machines (RBMs), was compared with the performance of the similarity search based on Tanimoto coefficient with "data fusion". The AUC measure for ROC curves and 1 %-enrichment rates for 20 targets were used in the benchmarking. Five different scores were computed: the energy for HNs, the free energy and the reconstruction error for RBMs, the mean and the maximum values of Tanimoto coefficients. The performance of the harmony search was shown to be comparable or even superior (significantly for several targets) to the performance of the similarity search. Important advantages of using the harmony search for virtual screening are very high computational efficiency of prediction, the ability to reveal and take into account regularities in active structures, flexibility and interpretability of models, etc.


Assuntos
Redes Neurais de Computação , Algoritmos
18.
Expert Opin Drug Discov ; 11(8): 785-95, 2016 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-27295548

RESUMO

INTRODUCTION: Neural networks are becoming a very popular method for solving machine learning and artificial intelligence problems. The variety of neural network types and their application to drug discovery requires expert knowledge to choose the most appropriate approach. AREAS COVERED: In this review, the authors discuss traditional and newly emerging neural network approaches to drug discovery. Their focus is on backpropagation neural networks and their variants, self-organizing maps and associated methods, and a relatively new technique, deep learning. The most important technical issues are discussed including overfitting and its prevention through regularization, ensemble and multitask modeling, model interpretation, and estimation of applicability domain. Different aspects of using neural networks in drug discovery are considered: building structure-activity models with respect to various targets; predicting drug selectivity, toxicity profiles, ADMET and physicochemical properties; characteristics of drug-delivery systems and virtual screening. EXPERT OPINION: Neural networks continue to grow in importance for drug discovery. Recent developments in deep learning suggests further improvements may be gained in the analysis of large chemical data sets. It's anticipated that neural networks will be more widely used in drug discovery in the future, and applied in non-traditional areas such as drug delivery systems, biologically compatible materials, and regenerative medicine.


Assuntos
Desenho de Fármacos , Descoberta de Drogas/métodos , Redes Neurais de Computação , Desenho Assistido por Computador , Sistemas de Liberação de Medicamentos , Humanos , Modelos Biológicos , Preparações Farmacêuticas/administração & dosagem , Preparações Farmacêuticas/química , Relação Estrutura-Atividade
19.
J Chem Inf Model ; 55(11): 2403-10, 2015 Nov 23.
Artigo em Inglês | MEDLINE | ID: mdl-26458083

RESUMO

Predicting the activity profile of a molecule or discovering structures possessing a specific activity profile are two important goals in chemoinformatics, which could be achieved by bridging activity and molecular descriptor spaces. In this paper, we introduce the "Stargate" version of the Generative Topographic Mapping approach (S-GTM) in which two different multidimensional spaces (e.g., structural descriptor space and activity space) are linked through a common 2D latent space. In the S-GTM algorithm, the manifolds are trained simultaneously in two initial spaces using the probabilities in the 2D latent space calculated as a weighted geometric mean of probability distributions in both spaces. S-GTM has the following interesting features: (1) activities are involved during the training procedure; therefore, the method is supervised, unlike conventional GTM; (2) using molecular descriptors of a given compound as input, the model predicts a whole activity profile, and (3) using an activity profile as input, areas populated by relevant chemical structures can be detected. To assess the performance of S-GTM prediction models, a descriptor space (ISIDA descriptors) of a set of 1325 GPCR ligands was related to a B-dimensional (B = 1 or 8) activity space corresponding to pKi values for eight different targets. S-GTM outperforms conventional GTM for individual activities and performs similarly to the Lasso multitask learning algorithm, although it is still slightly less accurate than the Random Forest method.


Assuntos
Algoritmos , Desenho Assistido por Computador , Desenho de Fármacos , Inteligência Artificial , Humanos , Probabilidade , Relação Quantitativa Estrutura-Atividade
20.
J Chem Inf Model ; 55(1): 84-94, 2015 Jan 26.
Artigo em Inglês | MEDLINE | ID: mdl-25423612

RESUMO

This paper is devoted to the analysis and visualization in 2-dimensional space of large data sets of millions of compounds using the incremental version of generative topographic mapping (iGTM). The iGTM algorithm implemented in the in-house ISIDA-GTM program was applied to a database of more than 2 million compounds combining data sets of 36 chemicals suppliers and the NCI collection, encoded either by MOE descriptors or by MACCS keys. Taking advantage of the probabilistic nature of GTM, several approaches to data analysis were proposed. The chemical space coverage was evaluated using the normalized Shannon entropy. Different views of the data (property landscapes) were obtained by mapping various physical and chemical properties (molecular weight, aqueous solubility, LogP, etc.) onto the iGTM map. The superposition of these views helped to identify the regions in the chemical space populated by compounds with desirable physicochemical profiles and the suppliers providing them. The data sets similarity in the latent space was assessed by applying several metrics (Euclidean distance, Tanimoto and Bhattacharyya coefficients) to data probability distributions based on cumulated responsibility vectors. As a complementary approach, data sets were compared by considering them as individual objects on a meta-GTM map, built on cumulated responsibility vectors or property landscapes produced with iGTM. We believe that the iGTM methodology described in this article represents a fast and reliable way to analyze and visualize large chemical databases.


Assuntos
Algoritmos , Bases de Dados de Compostos Químicos , Entropia , Bibliotecas de Moléculas Pequenas , Solubilidade , Interface Usuário-Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...